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ABSTRACT 


It  is  well  known  that  the  standard  estimator  for  variance 

components  in  analysis  of  variance,  Model  II,  a   ,  can  be 

a 

negative  with  positive  probability.   In  practice,  when  such 
an  estimator  is  found  to  be  negative  it  is  taken  to  be  zero. 
Very  little  is  known  about  the  properties  of  the  correspond- 
ing truncated  estimator.   This  thesis  investigates  the  vari- 

.  .  .       "2 

ance  and  bias  of  the  positive  truncated  estimator  a      .   A 

method  of  selecting  I,    the  number  of  classes,  is  presented 

that  produces  maximum  power  for  a  test  of  the  hypothesis  that 

2 

a   =0  while  keeping  the  variance  and  bias  as  small  as 

possible. 
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I.   INTRODUCTION 

The  use  of  Model  II  or  the  Random  Effects  Model  in  anal- 
ysis of  variance  can  best  be  described  by  a  simple  example. 
Suppose  we  draw  a  sample  of  I    pieces  of  steel  from  the  popula- 
tion of  pieces  which  have  been  subjected  to  a  particular 
annealing  process.   These  I    pieces  may  be  considered  as  a 
random  sample  from  the  population  composed  of  all  such  pieces 
of  steel  which  have  been  or  will  be  produced  by  this  specific 
process.   We  might  wish  to  determine  the  variation  of  flex- 
ural  rigidity  after  the  annealing  process  between  the  various 
members  of  the  whole  population.   If  exact  measurements  of 
flexural  rigidity  could  be  taken  from  the  pieces  on  hand,  the 
variance  could  be  derived  from  straight  forward  statistical 
methods.   However,  the  experimental  methods  used  to  measure 
flexural  rigidity  are  subject  to  error.   This  error  is  re- 
flected in  the  fact  that  if  several  measurements  are  taken 
from  one  piece  of  steel,  the  results  are  not  always  exactly 
the  same.   In  fact,  it  may  be  the  case  that  the  measurement 
(experimental)  errors  are  of  the  same  or  greater  magnitude 
than  the  variation  we  wish  to  measure  between  the  true  rigid- 
ities of  the  different  pieces.   Using  analysis  of  variance  - 
Model  II,  it  is  possible  to  separate  and  isolate  these  two 
different  causes  of  variation  and  to  obtain  an  estimate  of 
the  true  variation  of  rigidity. 

The  data  for  such  an  analysis  will  consist  of  several 
different  measurements  of  flexural  rigidity  taken  from  each 


of  the  I    pieces  of  steel.   If  we  take  r  measurements  on  each 
of  the  £  pieces,  then  the  total  number  of  measurements  will 
be  £r  =  N. 

The  Model  II  analysis  of  variance  now  takes  the  form 

Yij  =  y+  ai  +  eij'  i=1'2'  •••*►  J=1'2'"  r-         (1.1) 

The  following  assumptions  and  definitions  are  standard 

for  this  model: 

Y.  .  represents  the  j    measurement  of  the  i    piece 
of  steel, 

y  is  the  "true"  mean  flexural  rigidity  of  the  population 
and  is  assumed  constant, 

a.  is  the  deviation  from  the  mean  of  the  i   piece. 

1  2 

The  a.  are  assumed  to  be  distributed  N  (0,  a  ) , 
l  'a 

e. .  is  the  measurement  error  of  the  j    measurement 

^  .  th 

oh  the  l    piece.   The  e. .  are  assumed  to  be 

2     ^ 
distributed  N  (0,  a  ), 

a.  and  e. .  are  assumed  independent, 
l       13  * 

For  the  balanced  one-way  classification  Model  II  analysis 

of  variance  just  described,  it  is  well  known  that  the  minimum 

variance  unbiased  estimator  for  the  true  variation  between 

the  pieces  is 

n    MS   -  MS 

S2  =  — a s_  (i.2) 

a       r 

1  -2 

where  MS   =  [  r(y.  ■  y)  /U-l) 


r 

and  MS 
e 

3 


=  I         I    (Yii  -  y.)A(r-l). 
i=l  i=l  ^    x 


Leone  and  Nelson  [Ref.  3]  found  from  an  empirical  study 
that  this  estimator  can  be  negative  with  probability  as  high 


as  .40.   In  practical  applications  the  estimator  is  taken  as 
zero  whenever  it  is  negative.   This  then  produces  a  trunca- 
tion of  the  true  distribution  of  the  minimum  variance  un- 
biased estimator.   The  truncated  estimator  takes  the  form 

if  MSa  >  MSe  (1.3) 


otherwise. 

The  properties  of  this  truncated  estimator  are  unknown  at 
present. 

Consider  a  situation  where  N,  the  total  number  of  exper- 
iments, is  fixed.  Within  this  framework,  this  paper  is  con- 
cerned with  an  empirical  investigation  of  the  properties  of 

~2 

the  estimator  a 

a 

The  following  questions  are  considered: 

1.  What  can  be  said  concerning  the  effects  of 

various  choices  of  I   and  r  on  the  bias  and 

~2 
variance  of  a   ? 
a 

-2 

2.  How  does  the  variance  of  a   compare  with  the 

~2  a 

variance  of  a   for  a  given  N  and  I? 

a       3 

3.  Can  an  allocation  method  for  I    be  found  to 
yield  minimum  variance  or  minimum  bias  for 

aa  ? 

4.  If  such  an  allocation  method  can  be  developed, 
how  does  it  compare  with  the  allocation  formula 
for  r  developed  by  Hammersley  [Ref.  1]  to  mini- 

,         o\ 

mize  the  variance  of  a     when  K  =  —~    is  known? 

a  2. 

°e 

5.  If  nothing  is  known  about  K  is  there  a  "best" 
allocation  method  for  £? 


6.   If  we  are  testing  the  null  hypothesis  H  : 

2  2° 

a   =0  against  H, :  a   ^  0,  how  does  the  alloca- 
a  j.    a 

tion  of  I   affect  the  power  of  this  test? 


II.   METHODOLOGY 


A.   THEORY 

In  order  to  investigate  the  behavior  of  the  estimator  it 
is  first  necessary  to  develop  the  distribution  of  a   and 

expressions  for  the  bias  and  variance  of  this  estimator. 

~2 

1.   Distribution  of  a 

a 


Let  U  and  V  be  two  independent  random  variables  with 
Pearson  Type  III  distribution  so  that 


T1+l 


fCu)  = 


rcxj+u 


u   e  if  u  >  0 


0  otherwise 

T  +1 

Y5       ToY  -Y2V 

and  f  (v)  =  J  — - v   e  if  v  >  0 

r(x2+i) 

otherwise . 
Pearson  [Ref.  4]  found  the  distribution  of  Y  =  U  -  V  to  be 

Tl+1   T2+1 
Yl    y2  Y2y  t2      T  (T  +1) 

e   y   [1  + 


g(y)  -  t  +1     *      ^      ii(Y1+Y2)y 


rcx^D  (y1+y2) 


T0(T0-1) (T,+l) (T,+2) 

+  -i 1 :L_ ± +  •  •  •  ]    if  y  >  0 


2!(Y1+Y2)V 


(2.1a) 


T1+l       T2+l 


Yl  Y2  Yiy  T1  T      (T9+l) 

g(y)    =   _i 1 e    1    (.y)    1U   +        12 


2—  xi(Y1+7_)7-y) 

r(x2+i)  (yx+y2)  -1     2 


t1(x1-1)(t2+1)(x2+2) 

+    5 5 +    •  •  •]       if    y    <    0     . 

2!(Y1+Y2)     (-Y) 


He   also   showed   that 

K  IY 


(2.1b) 


g(y)dy  =  y~+t~  (Ti+1'  T2+1)   '  (2*2) 


T    +1    Iy 

yg(y)dy  =  —  —p-  (t1+i,  x2+i) 


T1+l       IYl 


(t,+2,    ro)  (2.3) 


Yl         VY2 


C55    tv  t  -(T2+1)       t  -(Ti+1) 

and         \     eCYg(y)dy   =    (1--^)         A         U+~)         L 
*o  Y2  Y2 

lY,+t 

y£*yJ(ti+1«  t2+1>   -  (2-4) 

where  I  (p,q)  is  the  ratio  of  the  incomplete  beta  function 
to  the  complete  beta  function. 

If  we  choose  y±   =   ^  ,  Tl  =  j  -  1/  Y2  =  Ja  (2,5) 

and  t2  =  j  -  1  , 

and  substitute  these  values  into  2.1,  we  obtain  the  density 
function  of  Y  =  aX..  -  bX~  ,  where  X,  and  X~  are  independent 
chi-square  variables  with  n  and  m  degrees  of  freedom  respec- 
tively and  a  and  b  are  positive  constants.   This  density 
function  becomes 


n 


n  ,n-m. 
r(J)2  a  '       (a+b) 


-y/2a  2 
e  J/   y 


£  -  1 


2f  r~   fi  -  Sj  • 


g(y)  - 


-2ab 
(a+b)y 


]  r   y  >  0 


n 


rti  ,m-n.      n 
r(J)22b   2   (a+b)2 


-y/2a,   ,2 
!  J/    (-y) 


£  -  1 


2f  r-   (i  -  -)  • 

oL2  '  l     2j  ' 


2ab 


(a+b)y 


]  ,  y  <  o 


(2.6) 


where 


2Fo[p,q;X] 


X 


n 


Vp)n(q)n  *T 
n=0 


with 


(a)n  =  a(a+l) 


Now  let 


/    ,       -i  \    r (a+n) 
..(a+n-1)  =  — ^ 


Y+   = 


if  Y  >  0 


otherwise . 


.+ 


The  distribution  for  Y  will  be 

/  y  <  0 


P 


Hy+(y)  = 


I 


g(y)dy    ,  y  =  o 
\  g(y)dy    ,   y  >  0  , 


.+ 


(2.7) 


10 


and  using  Eq.  (2.6)  we  get  the  density  function 


P 


V(y)  =  1 


b  (j,    j)         ,      Y   -  0 

a+F 


( 


g(y)       f  y   o 

0  ,   otherwise. 


(2.8) 

From  equations  (2.3),  (2.4),  (2.5),  and  (2.8)  it  follows  that 


E(Y+)  =  na  I    (£,  £■)  -  mb  I  _  (5  +  1,  £  -  1)  , 


(2.9) 


a+b  a+b 

and 

n       m 
■  (.**+)   =  I  b  (J.  $  +  (l-2atr2(l+2bt)2Ia(1+2bt)(|,  J), 
a+b  a+b 

(2.10) 

The  distribution  of  a   is  the  same  as  that  of  Y+  with 

a 

n  =  £-1    m  =  I (r-1) , 

a  =   ra-l)3   and  b  *  rt(r-l)   '  (2al) 

~2 

2.   Variance  and  Bias  of  a^  . 

ct 

-2 

As  indicated  above,  the  distribution  of  a   is  the  same 

'  a 

as  the  distribution  of  Y   for  the  proper  choice  of  n,  m,  a, 

and  b.   Thus,  Eqs .  (2.9)  and  (2.10)  give  the  expected  value 

-2  -2 

of  a   and  its  moment  generating  function  when  a   is  substi- 
a  a 

•fcuted  for  Y   and  m,  n,  a,  and  b  are  defined  as  in  Eq.  (2.11) . 

"2 

The  variance  of  a   can  now  be  derived  by  recalling 

a 

that 


11 


2         2 
Var  (a  )  =  [-()]      f  (2.12) 

dt  t=0 

~2 
where  M  =  E (e   a)  . 

Applying  successive  derivitives  to  Eq.  (2.10)  and  evalu- 

~2 

atmg   at   t   =   0 ,    the  expected  value   of   a      is    found   to  be 

„/~2.         dMi  _         ,m       n»  .     T  ,m       n, 

ECaa}    =  dt't=0   =  na   T   a    (2    '    2°    "  «*   X   a      (?  '    2° 

aTF  a+E" 

-  -   1  -  -   1 

+  Hf^2        <i+b>2        '*%-%  <2'13> 

where  $(~-,  5-)  is  the  beta  function  with  parameters  m/2  and 

2 
,_     ,   d  Mi       ,22^2    tv-   .  oi2  ,,2  2.  _    ,111   nx 
n/2,  and   — r\  =  (a  n  +  2a  n-  2abmn+2b  m+b  m  )  I    (T  ,  T) 

S  -  1     n  -  1 
.  2ab  ,      ,   .  „     0,»  ,  a  x2     ,  b  .2     /0  ,m   n. 

+  i+b  (an  "  bm  +  2a    2b)  (a+b}      (a+b}      /3(2  '  2}  ' 

(2.14) 
Equation  (2.13)  can  be  shown  to  be  equivalent  to  Eq.  (2.9)  . 

Squaring  Eq.  (2.13)  and  subtracting  from  Eq.  (2.14),  the 

~2 
variance  of  a   is  obtained  as 
a 

Var  (a2)  =  2  (na2  +  mb2)  I    (?,?-) 
a  a  z        z 

a+F 

+  (na  -  mb)  [na  I    («■  ,  =■)'  -  mb  I    (^  +  1,  ^  -  1)] 

a+b  a+b 

m     n    , 
+  4<S?b>%£b>T~  b(.-b>>B $.§)-[«  I  a  (£,£) 

a+b 

2 

-mb  I    (™  +  l,  "  -  1)  ]   . 
a   2       <£ 

a+b 

(2.15) 

12 


From  Eq.  (2.9),  the  bias  of  the  estimator  is  given  by 

bias  =  na  I  i?F  (^  f  r)  -  mb  I  a  (y  +  1,  y  -  1)  -  a 

a+b 

(2.16) 

^2 

The  expression  for  the  variance  of  a   is 

a 

2  2  4 

Var    {at)    =  -f[-^ —     +  tS=T\  ]     *  (2'17) 

a  r2        r2(£-l)  Mr   i} 

Thus,    values    for   the   variance    of    the   minimum  variance    un- 

^2 

biased  estimator,  g  ,  can  be  computed  for  a  comparison  with 

-2 

the  variance  of  a   for  fixed  N  and  £  . 

a 

3.   Power 

In  considering  the  problem  of  selecting  an  %    for  a 

fixed  N  when  testing  a  given  hypothesis  based  on   the  sample, 

the  power  of  the  test  is  an  important  consideration.   Suppose 

2 

the  null  hypothesis  Ho:   a   =  0  is  being  tested  against  the 

a 

2 

alternative  hypothesis  HL  :  a   ^  0  from  a  sample  of  I    classes, 

each  class  consisting  of  r  observations.   From  the  analysis 

MS 
of  variance  table,  the  test  statistic  is  found  to  be  ^  =  rTg—  / 

ss  sse 

where   MSa  =  —J  and   MSe  =  ^^  . 

SS  SS 

It  may  be  shown  that      a     and     e    are 

a   +  ra  a 

e     a  e 

independent  chi-square  variables  with  I  -   1  and  £(r-l)  degrees 

MS 
of  freedom  respectively.   The  statistic  rrg—  may  now  be  re- 

e 
written  as 


13 


ss 

( *)/(£_!) 

^  =   G  ss   a .  (2.18) 

(— t)/Z(r-l) 
e 

and  is  a  ratio  of  independent  chi-square  variables  divided 

by  their  degrees  of  freedom  and  is  distributed  as 

F,„  ,,   „  ,   ,  *  ,   where  F   ,  is  a  central  F  variable  with 
(£-1) ,  I  (r-1)  '  a,b 

a  and  b  degrees  of  freedom. 

2 

If  H   is  true,  i.e..  a   =0  the  test  statistic 
o        '      '   a 

SS  /U-l) 

>  =   SS*A(r-l)   is  distributed  as  F(*-l),  Mr-1)  ' 

Thus,  a  test  of  the  null  hypothesis  consists  of  re- 
jecting H   at  a  level  of  significance  a  ,  if 

&    -   Fa; (l-l)  ,  JL(r-l)  " 

The  power  of  this  test,  denoted  6(0),  is  given  by 

MS  a 

3(9)  -  P[^>  Pa;(Jl.1)f  £(r_1}l   where   9  =  -y-  . 

2   B  e 

But  if  a   7^  0,  then 
a  /   ' 

MS 
a 


8(6)  -  rf  e      a    >   Qe  Fa;  (&-D,  Mr-1)  , 
3(6)  ~P[-^  g2  +  ra2  ] 

T~  e      a 


a 
a 


2 
OF 


x.rT,  -   eg;  (£-1),  &(r-l), 

-  P[F(£-1),  I  (r-1)  '      Q2    I    r02 

e      a 


14 


This  expression  can  be  evaluated  after  the  transformation 

* — ^ — • 

1  +  —  F  ,   0 
Y2   Yl,Y2 

The  variable  Y  is  distributed  as  $(Yw  Y2)  • 
Thus 

(3(0)  =  P[— 1 <  * ] 

i    £  -  1   „                 (fc-l)a   F   0  t  0  /   -.x 
1  +  Mr  -1)  FJt-l,A(r-l)    1  + e   a;£-l  Mr-1) 

£(r-l) (a  +  ra  ) 
e     a 


or 


3(6)  =  IX[A-1,  fc(r-l)] 


where 

X  = 


(il-l)a2  F    ,0-ixo/    n 

1  +       e   a;  (£-1)  ,ft (r-1) 

H(r-l) (a2  +  ra2) 
e     a 


2 
yields  the  power  of  the  test  of  hypothesis  H  :  a   =0,  for  a 

specified  a  and  N. 


B.   DATA  GENERATION 

From  Eqs.  (2.9),  (2.15),  (2.16),  and  (2.19)  it  can  be 

seen  that  each  of  the  properties  to  be  analyzed  is  dependent 

2    2 
on  four  variables:   a  ,  a  ,  r  and  I.      Recall  that  N  =  £r  is 

e   a 

the  total  number  of  experiments  to  be  conducted.   If  N  is 

fixed  the  choice  of  either  I   or  r  determines  the  other.   Thus, 

2       2 

for  fixed  N,  we  have  only  three  variables,  I,    a      and  a   .   We 

•*  e      a 

c    -2 
now  wish  to  see  what  happens  to  the  bias  and  variance   of  a 

a 

and  the  power  of  the  specified  test  of  hypothesis  as  the  three 
variables  take  on  a  range  of  values. 


15 


Calculation  of  these  statistics  was  done  on  an  IBM  360/67 
computer  using  the  basic  program  shown  in  Appendix  A.   In 
addition  to  this  basic  program,  the  IBM  supplied  subroutines 
for  computation  of  the  beta-distribution  were  also  used. 

The  values  chosen  for  N  were  12,  16,  20,  40,  50,  80  and 

100.   For  each  value  of  N,  I    varied  through  all  integral 

N 
divisors  of  N  such  that  4  <,  I    <_  -j  .   For  example,  for  N  =  80 , 

I    took  on  the  values  4,  8,  16,  20,  and  40. 

2 

Initially,  for  each  combination  of  N  and  I,    both  a   and 

2 

a   were  varied  from  .1  to  2.0  in  steps  .1  and  again  from  1.0 
a 

to  20.0  in  steps  of  1.0.   Values  were  computed  for  the  vari- 

~2  ^2 

ance  and  bias  of  a   ,  the  variance  of  a   and  the  power  of  the 

a  a 

specified  test  of  hypothesis  for  all  possible  combinations  of 

2        2  . 

N,  Zg    a    ,  and  a   in  the  ranges  described, 
e       a 

The  data  generated  in  this  manner  supported  the  conten- 

2 
tion  of  Scheffe's  that  a   is  simply  a  scaling  factor  for  both 

~2  ~2 

the  bias  and  variance  of  a      and  the  variance  of  a  .   Figures 

a  a      3 

2 

1  and  2  illustrate  the  scaling  influence  of  a   on  the  bias 

~2 

and  variance  of  a   when  N  =  20  and  £  =  4  and  5. 

a 

As  for  the  power  of  the  test  of  hypothesis,  Scheffe'  has 

2   2 
shown  3(8)  to  be  dependent  only  upon  the  ratio  0   /a      and 

a   e 

2 

again  a   can  be  considered  as  a  scaling  factor. 

2 

Based  on  these  considerations,  a   was  set  at  one  for  all 

e 

data  generated  for  use  in  this  thesis.   This  greatly  reduced 
the  amount  of  time  and  output  required  for  computer  runs  and 

further  reduced  the  number  of  input  variables  to  two,  I    and 

2  2  2  . 

a   for  each  fixed  N.   Further,  if  a   =1,  the  value  of  a,  is 
a  '      e  a 
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Figure  1.   Graph  showing  the  effects  of  the  scaling 

2  ~2 

influence  of  a      on  the  variance  of  a   for  N  =  20. 
e  a 

The  three  upper  curves  are  for  1=4,    and  the 
three  lower  for  I  —   5 . 
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Var  (a2) 
a 


Figure  2.   Graph  showing  the  scaling  effects  of 

The  top  curve 


2  2 

a.  on  the  bias  of  a      for  N  =  20. 

a 

in  each  pair  is 
for  i   =  5. 


for  I  -   4,  and  the  bottom  curve 
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also  the  value  of  the  ratio  of  K  =  -r  .   Since  o      is  a  scal- 
er2 e 
e 

2 

ing  factor,  all  conclusions  drawn  using  a   =1  are  equally 

2 
valid  for  any  other  a   .   The  direction  of  change  for  fixed 

2  2 

a      is  as  fellows:   as  a   increases  (decreases)  the  variance 
a  e 

and  bias  increase  (decrease)  and  the  power  of  the  test  of  the 

test  of  hypothesis  decreases  (increases) .   The  magnitude  of 

2       2 

the  change  depends  on  the  magnitudes  of  o      and  a   ,  N  and  &. 

e      a 

In  order  to  evaluate  the  power  of  the  test  of  hypothesis, 
it  was  necessary  to  choose  a  level  of  significance,  alpha. 
An  alpha  of  .05  was  used  throughout  this  paper. 

Wang  [Ref.  8]  has  conducted  a  similar  study  of  the  bias 

-2 
and  variance  of  several  estimators,  including  a   .   Her  study 

a 

was  restricted  to  the  special  case  where  X,  and  X,,  took  on 
only  even  degrees  of  freedom  and  N  took  on  values  of  9,  27, 
81  and  225.   There  are  no  direct  points  of  comparison  between 
the  data  she  generated  and  the  data  in  this  thesis.   However, 
a  very  favorable  comparison  of  computed  variance  and  bias 
exists  for  value  of  N  and  I    as  nearly  matching  as  is  possible 

Wang's  variance  and  bias  expressions  for  N  =  81,  1=9,    and 

2  -2  ~2 

a      =  1.0  yield  var  (a  )  =  .309  and  bias  (a  )  =  0  while  the 
a  a  a 

2 
data  from  this  thesis  for  N  =  80,  I   =  10 ,  and  a   =1.0  yield 

a 

"2  ~2 

var  (a  )  =  .282  and  bias  (a  )  =0. 
a  a 
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III.   PRESENTATION  OF  FINDINGS 

A.   GENERAL  OBSERVATIONS 

Before  attempting  to  address  the  specific  question  of  the 

selection  of  I,    several  general  conclusions  can  be  drawn  from 

the  data  regarding  the  bias  of  a  ,  the  variance  of  a   and  a 

a  a      a 

and  the  power  of  the  specified  test  of  hypothesis. 

~2 

Generally  speaking,  the  bias  of  a   is  very  small.   As 

a 

shown  in  Table  I,  the  bias  decreases  as  K,  the  ratio  of 

2   2 

o    /a   increases.   For  small  N  and  small  K,  the  bias  is  sig- 
a   e 

nificant.   However,  if  K  is  greater  than  1.0  and  N  greater 
than  or  equal  20,  the  magnitude  of  the  bias  is  so  small  that 
it  can  be  neglected.   For  this  range  of  N  and  K,  the  maximum 

value  the  bias  assumed  is  less  than  one  percent  of  the  true 

2  ~2 

value  of  a    .   Thus,  a   is  virtually  unbiased  in  this  range, 
a  a 

~2 
The  bias  of  a  was  found  to  be  negative  for  many  combina- 
tions of  N,  I,    and  K.   However,  for  the  entire  range  of  input 
variables  for  which  data  was  generated,  the  negative  bias  was 

always  insignificant  to  the  fourth  decimal  place. 

"2  ~2 

As  shown  in  Table  II,  the  variances  of  a   and  a   are  very 

'  a      a 

nearly  the  same  except  when  K  is  small.   For  small  values  of 
K  there  is  a  significant  difference  between  the  two.   However, 
this  difference  decreases  sharply  as  K  increases  and  is  neg- 
ligible for  K  >  1.0.   The  difference  between  the  variances 

is  further  decreased  as  N  increases.   Thus  the  variances  of 

"2  ~2 

o      and  0  r    appear  to  approach  each  other  asymptotically  as 
a      a 

N  and/or  K  increase. 
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TABLE  I 

EFFECT  OF  N  AND  K  ON  THE  BIAS  OF  a2 

a 

a2   =  K,  I   =  4,  a2  =  1.0 

e 

N         K        BIAS         N        K       BIAS 

12        .1       .0951        40 


a 

K 

BIAS 

.1 

.0951 

.5 

.0478 

1.0 

.0270 

2.0 

.0129 

.1 

.0627 

.5 

.0265 

1.0 

.0138 

2.0 

.0061 

.1 

.0452 

.5 

.0167 

1.0 

.0081 

2.0 

.0035 

16        .1       .0627        80 


20        .1       .0452       100 


.1 

.0153 

.5 

.0037 

1.0 

.0016 

2.0 

.0006 

.1 

.0045 

.5 

.0010 

1.0 

.0005 

2.0 

.0001 

.1 

.0029 

.5 

.0004 

1.0 

.0002 

2.0 

.0001 

"2 

Table  II  also  indicates  that  the  variance  of  a   is  always 

a  2 

less  than  or  equal  the  variance  of  a   .   The  introduction  of 

a  small  amount  of  bias  by  truncation  of  the  estimator  tends 
to  reduce  the  variance. 

As  is  to  be  expected,  the  power  of  the  test  of  hypothesis 
increases  as  N  increases.   In  the  model  proposed  here,  power 
is  also  a  function  of  I   when  N  is  fixed.   For  all  values  of 

K  tested,  it  was  found  that  if  N  >  16  and  I    <    j,    3(9)  >_  .9996 

N 
This  implies  that  for  values  of  N  >  16  and  I    <   y,  the 

power  criterion  can  be  ignored  in  the  selection  of  &.   Atten- 
tion can  then  be  directed  to  minimizing  variance  and/or  bias 
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TABLE  II 

DIFFERENCES  IN  TRUNCATED  (a  )  AND  UNTRUNCATED  (a2) 

a  a 

VARIANCE  ESTIMATORS  FOR  VARIOUS   VALUES 
OF  N,  K,  AND  ARBITRARY  I 


N 

12 

12 

12 

12 

12 

20 

20 

20 

20 

20 

40 

40 

40 

40 

40 

100 

100 

100 

100 

100 


K 


0.1000  4 

0.5000  4 

1.0000  4 

2.0000  4 

5.0000  4 

0.1000  5 

0.5000  5 

1.0000  5 

2.0000  5 

5.0000  5 

0.1000  8 

0.5000  8 

1.0000  8 

2.0000  8 

5.0000  8 

0.1000  20 

0.5000  20 

1.0000  20 

2.0000  20 

5.0000  20 


VAR  (a  ) 
a 


0.0648 
0.3461 
1.0158 
3.3828 

18.5601 
0.0336 
0.2376 
0.7305 
2.4754 

13.7216 
0.0170 
0.1347 
0.4093 
1.3831 
7.7275 
0.0081 
0.0525 
0.1526 
0.5105 
2.8473 


VAR    (a2) 
a 


0.15  30 
0.4907 
1.2130 
3.6574 

18.9907 
0.0696 
0.2896 
0.7896 
2.5396 

13.7896 
0.0282 
0.1425 
0.4139 
1.3854 
7.7282 
0.0105 
0.0526 
0.1526 
0.5105 
2.8473 
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in  selecting  I   with  the  assurance  of  a  very  strong  test  of 
the  hypothesis. 

B.   THE  SELECTION  OF  I    FOR  FIXED  N 

In  the  model  being  studied,  it  has  been  assumed  that  the 
number  of  observations  of  the  random  variable  being  observed 
is  fixed  at  N.   Further,  it  is  assumed  that  r  observations 
will  be  made  on  each  of  I   classes  of  the  observable  phenom- 
enon so  that  £r  —  N.   The  problem  now  arises  of  how  to  choose 
I    (or  r)  so  as  to  obtain  the  best  statistical  results.   The 
problem  is  complicated  by  the  fact  that  the  "best"  solution 
is  dependent  on  the  desired  result  of  the  analysis.   For  ex- 
ample, the  I   that  provides  the  most  powerful  test  of  hypothesis 

for  a  given  N  may  very  well  produce  maximum  bias  in  our  esti- 

2  ... 

mate  of  a   .   In  the  same  manner,  the  I   that  provides  minimum 
a 

bias  or  minimum  variance  in  the  estimator  may  produce  a  very 

2 

weak  test  of  the  hypothesis  that  o      =0. 

The  selection  of  I    is  further  complicated  by  the  fact  that 
all  of  the  parameters  of  interest  are  dependent  on  K. 

Hammersley  [Ref.  1]  developed  an  expression  for  r  which 

produces  minimum  variance  in  a      ,    the  unbiased  estimator  of 

2 

a      .   Equating  the  first  derivative  of  the  expression  for  the 
a 

o 
variance  of  a   to  zero,  Hammersley  showed  that  the  integral 
a 

divisor  of  N  that  most  nearly  satisfies 

(K+1)N  +  1 
rh       KN  +  2 

~2 

produces  minimum  variance  in  a   .   For  the  range  of  N  and  I 

used  in  this  study,  the  value  of  r,  also  produce  minimum 

-    ~2 
variance  for  a 
a 
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The  r,  proposed  by  Hammersley  has  two  unpleasant  features. 
First,  for  some  combinations  of  N  and  K,  the  power  of  the 

specified  test  of  hypothesis  is  very  low.   For  example,  if 

2 
K  =  .5  and  N  —  12.  the  power  of  a  test  of  H  :  a      =  0  is  only 

o   a  J 

.2561. 

The  second  and  perhaps  more  serious  feature  is  that 

Hammersley' s  solution  for  r,  requires  a  knowledge  of  the  ratio 

2   2 

a   /a   prior  to  conducting  the  intended  analysis.   In  an  envi- 
a  e 

ronment  such  as  the  flexural  rigidity  experiment  where  the 

2       2 
general  magnitudes  of  a   and  a  would  be  known  from  previous 

a      e 

experiments  on  similar  products  this  requirement  may  not  be 
serious.   However,  for  a  one-time -only  experiment,  or  an 
evaluation  of  a  new  process  this  requirement  may  be  completely 
unreasonable . 

The  results  of  the  present  study  indicate  that  power  is 
maximized  for  small  I   while  variance  is  minimized  when  £  as- 
sumes its  maximum  value  of  N/2.   But  it  has  already  been  shown 

N 
that  power  is  not  a  major  consideration  for  N  >  16  if  I   jf  ~-  . 

It  would  appear  then  that  I   should  be  selected  very  near  to 

but  not  equal  to  N/2. 

Based  on  these  considerations  it  appears  that 

»g-  [|-  i]- 

such  that  -77—  is  an  integer  is  the  "best"  choice  for  I,    that  is, 

x,g 

t      is  the  next  smaller  integral  divisor  of  N.   As  an  example 

g 


for  N  =  20 


la   =  [T^  "  1]"  =  [9]"  =  5' 


is  the  best  choice  for  I.       (See  Table  III.) 
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N 
Table  III  shows  a  comparison  of  JL  (JL  =  — )  and  £      for 

n  n   rh       g 

various  values  of  N.   It  also  shows  the  power,  variance,  and 

bias  generated  for  each  choice  of  I      and  I,     in  a  range  of  K 

values.   It  can  be  seen  that  £,  increases  with  K  to  its 

h 

maximum  value  of  N/2  while  I      is  fixed  for  a  given  N.   Also, 
when  I,     =   N/2,  the  power  of  the  test  is  small  for  small  N. 
In  fact,  for  N  as  large  as  100,  the  power  using  £,  may  be 

less  than  .9  while  power  for  I      never  falls  below  .9  for  any 

~2 

N.   As  was  expected,  the  variance  of  a   using  l,     is  consid- 
erably less  than  the  variance  acquired  using  I      since  £,  was 
derived  as  the  minimum  variance  choice  of  I. 

Generally  speaking,  the  bias  of  the  estimator  when  1  =  1 
is  less  than  or  equal  the  bias  when  1  =  1.  The  only  excep- 
tions to  this  being  when  K  =  .1  and  N  =  80  and  100.   The  bias 

for  both  1    =   1      and  &  =  &,  is  generally  less  than  three  per- 

2  . 
cent  of  the  true  value  of  a   if  K  >  1.0  and  less  than  one 

a       — 

percent  for  K  >  1.0  and  N  _>  20 . 

Again  it  seems  that  the  method  of  selecting  I    depends  on 

the  desired  results  of  the  original  analysis.  I,     will  always 

2 

produce  minimum  variance  in  the  estimate  of  a   but  requires 

a  knowledge  of  the  ratio  K  =   a   /°      .   If  K  is  known  and  a 

a  e 

minimum  variance  estimator  is  desired,  this  is  certainly  the 

best  method  of  choosing  l. 

If  a  powerful  test  of  hypothesis  is  desired  l      gives  a 

y 

much  more  powerful  test  for  most  combinations,  of  N  and  K  than 

will  &,  .   If  nothing  is  known  of  K,  &   gives  a  powerful  test 
ii  y 

and  a  relatively  small  variance. 
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IV.   CONCLUSIONS  AND  RECOMMENDATIONS 


A.   CONCLUSIONS 

It  may  be  concluded  that  with  one  exception,  I       is  the 
"best"  method  of  choosing  the  number  of  classes  for  Model  II 
analysis  of  variance  when  N  is  fixed.   The  exception  occurs 

in  the  case  where  K  is  known  and  a  minimum  variance  estimator 

2 
of  cj   is  desired  without  regard  to  the  power  of  the  test  of 

2 

hypothesis  that  o      =0.   In  this  case  &,  appears  best.   The 

use  of   I    =  I      assures  a  very  powerful  test  of  hypothesis  and 

will  yield  a  small,  but  not  minimum  variance  in  the  estimator, 

For  most  combinations  of  N  and  K,  &   also  produces  minimum 

g 

-2 
bias  in  o    . 
a 

~2 

If  N  >  20  and  K  >  1.0,  the  bias  of  a      is  so  small  as  to 
—  —  a 

be  negligible.   In  such  cases,  the  use  of  the  truncated  esti- 

2 
mator  of  a   has  no  significant  influence  on  the  results  of 
a 

the  analysis  except  to  cause  a  small  decrease  in  variance. 
As  N  and/or  K  increase,  this  decrease  in  variance  appears  to 
tend  toward  zero. 


B.   RECOMMENDATIONS  FOR  FURTHER  STUDY 

It  is  suggested  that  a  similar  study  of  variance  esti- 
mators be  conducted  for  value  of  N  greater  than  100  for  the 
full  range  of  K  values  studied  here.   Such  a  study  might  also 
investigate  values  of  K  less  than  the  minimum  value  of  .1 
used  in  this  study. 
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A  much  more  difficult  task  that  could  follow  the  same 
general  approach  would  be  an  investigation  of  two-way  and 
multi-way  analysis  in  an  effort  to  determine  the  best  number 
of  experiments  for  each  class  to  provide  minimum  variance  in 
the  variance  estimators  and  maximum  power  for  a  specified 
test  of  hypothesis. 
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APPENDIX  A 

THIS  PROGRAM  IS  DESIGNED  TO  COMPUTE  THE  VARIANCE,  POWER 
AND  BIAS  FOR  THE  Y+  ESTIMATOR  OF  THE  BETWEEN  CLASS  VARIANCE 
FOR  THE  BALANCED,  ONE-WAY  ANALYSIS  OF  VARIANCE,  MODEL  II. 

EXPLANATION  OF  SYMBOLS: 

VAR  IS  THE  TRUE  WITHIN  CLASS  VARIANCE. 

VARA  IS  THE  TRUE  BETWEEN  CLASS  VARIANCE. 

L  IS  THE  NUMBER  OF  CLASSES. 

IR  IS  THE  NUMBER  OF  EXPERIMENTS  IN  EACH  CLASS. 

XK  IS  THE  RATIO  OF  THE  BETWEEN  CLASS  AND  WITHIN  CLASS 
VARIANCES. 

N  IS  THE  TOTAL  NUMBER  OF  EXPERIMENTS.   N=L*IR. 

VART  IS  THE  Y+ ,  OR  POSITIVE  TRUNCATION,  OF  THE  ESTIMATE  OF 
THE  BETWEEN  CLASS  VARIANCE. 

VARR  IS  THE  MINIMUM  VARIANCE  UNBIASED  ESTIMATOR  OF  THE  BE- 
TWEEN CLASS  VARIANCE. 

XMEAN  IS  THE  EXPECTED  VALUE  OF  Y+ . 

POW  IS  THE  POWER  OF  THE  TEST  OF  HYPOTHESIS  THAT  THE  TRUE  BE- 
TWEEN CLASS  VARIANCE  IS  ZERO. 

XC  IS  THE  F-STATISTIC  FOR  ALPHA=.05  AND  L-l  AND  L*(IR-1) 
DEGREES  OF  FREEDOM,  USED  IN  COMPUTING  THE  POWER. 

THE  SUBROUTINE  BDTR  COMPUTES  THE  PROBABILITY  THAT  THE  RANDOM 
VARIABLE  U,  DISTRIBUTED  ACCORDING  TO  THE  BETA-DISTRIBUTION 
WITH  PARAMETERS  A  AND  B,  IS  LESS  THAN  OR  EQUAL  TO  X; 
BDTR(X,A,B) 

THE  FUNCTION  EYPLUS  COMPUTES  THE  EXPECTED  VALUE  OF  Y+ . 

THE  FUNCTION  VYPLUS  COMPUTES  THE  VARIANCE  OF  Y+ . 
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1002    READ(5, 105)L,IR,XC 

VAR=1. 

VARA=0 . 

IF(L) 1000,1000,1001 
1001    WRITE(6,100) 

XN=L-1 

XM=L*(IR-1) 

DO    3000    1=1,20 

VARA=FLOAT(I) 

A= (VAR+FLOAT ( IR)  *VARA) /FLOAT ( IR*  (L-l) ) 

B=VAR/FLOAT(L*IR*(IR-l) ) 

XMEAN=EYPLUS(XM,XN,A,B) 

BIAS=XMEAN-VARA 

VART=VYPLUS (XM, XN , A,B , XMEAN) 

VARR=(2./FLOAT(IR**2) ) * ( (VAR+FLOAT ( IR) *VARA) **2/ 

1FL0AT(L-1)+VAR**2/FL0AT(L*(IR-1) ) ) 

X=l./d+XN*XC*VAR/(XM*  (VAR+FLOAT  (IR)  *VARA)  )  ) 

CALL  BDTR(X,XN,XM,P,D,IER) 

POW=P 

N=L*IR 

XK=VARA/VAR 

WR I TE  ( 6 , 1 0 1 ) N , L , XK , VARA , POW , VARR , VART , B I AS 

GO  TO  1002 
1000  CONTINUE 

100  FORMAT  ('       N         L         XK         VARA         POWER         VAR(UNTRUN) 
1         BIAS(TRUN) '//'      ') 

101  FORMAT ('     ' ,I3,2X,I2,3F8,4,2X,F8.4,2X,F8.4,2X,F8.4) 
105    FORMAT (12, 14, F6. 2) 

END 


FUNCTION  EYPLUS(XM,XN,A,B) 
C=A/(A+B) 
H=XM/2 . 
E=XN/2 . 

CALL  BDTR(C,H,E,P,D,IER) 
BET=P 
F=H+1. 
G=E-1. 
IF(G)5,5,10 
5  EYPLUS=0 . 
RETURN 
10  CALL  BDTR(C,F,G,P,D,IER) 
BET1=P 

EYPLUS=XN*A*BET-XM*B*BET1 
RETURN 
END 
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FUNCTION  VYPLUS(XM,XN,A,B,XMEAN) 
IF(XMEAN) 10,5,10 
5  EYPLUS=0 . 
RETURN 
10  C=XM/2 . 
H=XN/2. 
E=A/(A+B) 
F=XN*A**2+XM*B**2 
CALL  BDTR(E,C,H,P,D,IER) 
BET=P 

G=XN*A-XM*B 
YM=XM/2 . 
YN=XN/2 . 
GANM=YM+YN 

CALL  GMMMA(YM,GX,IER) 
GAM=GX 

CALL  GMMMA(YN,GX,IER) 
GAN=GX 

CALL  GMMMA(GANM,GX,IER) 
COG=GAM*GAN/GX 
p=E**C 

R=(l.-E) **(H-1.) 
XK=COG*P*R*2.*B 

VYPLUS=2.*F*BET+XMEAN+2. *XK*(A-B) -XMEAN**2 
RETURN 
END 
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